NetNews Offline 2

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Offline 2 / NetNews Offline Volume 2.iso / news / comp / std / c / 392 < prev next >

Wrap

Internet Message Format | 1996-08-06 | 4.6 KB

Path: keats.ugrad.cs.ubc.ca!not-for-mail From: c2a192@ugrad.cs.ubc.ca (Kazimir Kylheku) Newsgroups: comp.lang.c,comp.std.c,comp.lang.c++ Subject: Re: Floating Point arithmetic problem Date: 14 Feb 1996 19:27:01 -0800 Organization: Computer Science, University of B.C., Vancouver, B.C., Canada Message-ID: <4fu965INNq3g@keats.ugrad.cs.ubc.ca> References: <c968da6jzm.fsf@damayanti.india.ti.com> NNTP-Posting-Host: keats.ugrad.cs.ubc.ca In article <c968da6jzm.fsf@damayanti.india.ti.com>, Kuntal Shah <kuntal@india.ti.com> wrote: > >I am having a wierd problem with floating point arithmetic. Gurus on >the net, please bail me out. I am working on the SUN 4.1.x platform. > >I have a "double" variable say d to which I need to add certain float >numbers of moderate magnitude (say less than 10000). This addition >occurs in a loop in my program which get executed more than a million >times depending on my testcase. > > { /* loop begin */ > > /* some code */ > > d = d + f; /* f < 10000 */ Ouch. Don't do that. The accumulated truncation error will kill ya. This can be fixed by using an integer counter which you multiply by a floating point scale factor, like this: #define STEPS 100000 { int i; /* loop counter */ int t; /* parameter */ for (i = 0; i < STEPS; i++) { t = (double) i / STEPS; /* ... */ } } Here the loop counter iterates through 100,000 steps, and the floating point parameter t varies from 0 to 1. It does not suffer from cumulative truncation errors because it is recalculated from the precise integer value each time by a multiplication. >Now coming to the problem. The insignificant digits due to the >floating point representation keep accruing and there comes a stage >when the accrued value exceeds 0.0001 which results in failure of the >if condition in the above block of code, when ideally no such thing >should have occurred. > >All I need is a solution that will overcome this problem. Please bear >in mind that the loop is executed millions of times and hence any >costly operation within the loop with drastically bring down >performance. > >I have a few options to overcome this problem :- > >* After each addition, covert 'd' to an unsigned long after > multiplying by say 1e8, (thus truncating the unnecessary digits), > and divide it by 1e8 to get back the original value. > >* After every few additions, say 1000, do the above operation. > >In both the above operations, a severe problem would arise in cases >when the value represented is less than the value asked for. For >example, > > f= 213.22 would be represented as 213.2199999999999988631316228 > >and cutting off the last few digits would result in negative accrual >in the wrong run. Since I am not sure of the value of f till run time, >I cannot solely depend on +ve or -ve accrual to happen. > >Do you have any solution to this problem? Ideally I would like the >following answers :- Try my above way of using an integer as a parameter. A 32-bit integer has a plentiful range for any number of iterations you are likely to attempt and can be thunked into floating point quite readily. On most workstations, the double type is 64-bit with a 52-bit mantissa can accomodate large 32-bit integers without truncation, so you should be OK regardless of what range of integers you convert to floating point. >* Is it possible to use bit wise operators (since they are lot faster > than other computations) to remove the least significant bits? I > tried doing this but wasn't all that effective. I strongly discourage you from even contemplating this. Floating point representations can vary wildly from architecture to architecture. The bit operators are really intended for unsigned integer operands. >* Is it possible to set to zero, say the last 10-15 digits of the > decimal part without any effect in the long run on the 5 digit > precision I require? Not easily. For one thing, the resulting number may not be representable in the machine's floating point format. The floating point format is typically binary, not decimal, and numbers that have terminating decimal digits in base ten may have repeating digits in base two. >* Is there any function that can round numbers off to the required > precision, ie, can I specify 0.66666666666623345 to be rounded off > to 0.666666666667 without undergoing the usual multiply, truncate, > divide flow. No. My advice: buy an undergraduate-level textbook on Numerical Analysis. A good book will explain floating point formats, coping with rounding and truncation errors and so forth, usually in the first chapter. --